Twitter System Design
1. Requirements (~5 minutes)
Functional Requirements
- ✅ Users should be able to post tweets (text, images, videos - max 280 characters)
- ✅ Users should be able to follow/unfollow other users
- ✅ Users should be able to view timeline (home feed with tweets from followed users)
- ✅ Users should be able to like, retweet, and reply to tweets
- ✅ Users should be able to search tweets and trending topics
- ✅ Users should be able to receive real-time notifications
Non-functional Requirements
- ✅ The system should prioritize availability over consistency (eventual consistency is acceptable)
- ✅ The system should scale to support 500M+ users, 200M DAU
- ✅ Timeline should load in < 500ms (P95)
- ✅ Tweet posting should be fast (< 200ms acknowledgment)
- ✅ The system should be highly available (99.99% uptime)
- ✅ The system should handle viral tweets (millions of likes/retweets)
- ✅ The system should support real-time notifications (push/pull)
- ✅ The system should handle read-heavy workload (read:write ratio = 100:1)
Capacity Estimation
Assumptions:
- Total Users: 500M
- Daily Active Users (DAU): 200M
- Average tweets per user per day: 2
- Average timeline views per user: 20
- Average tweet size: 300 bytes (text + metadata)
- Average followers per user: 200
- Media tweets: 20% (images/videos)
Daily Metrics:
Tweets per day = 200M × 2 = 400M tweets/day
Tweets per second = 400M / 86400 = ~4600 TPS
Timeline reads per day = 200M × 20 = 4B reads/day
Timeline QPS = 4B / 86400 = ~46K QPS
Read:Write Ratio = 4B / 400M = 10:1 (actually higher during viral events)
Storage:
Text tweets = 400M × 300 bytes = 120 GB/day
With metadata & index = 120 GB × 2 = 240 GB/day
Annual storage = 240 GB × 365 = ~88 TB/year
Media storage (20% of tweets):
80M media tweets/day × 2 MB avg = 160 TB/day
Bandwidth:
Upload: 4600 TPS × 300 bytes = 1.4 MB/s (text)
Upload: 920 media/sec × 2 MB = 1.8 GB/s (media)
Download: 46K QPS × 10 tweets × 300 bytes = 138 MB/s
2. Core Entities (~2 minutes)
User
userId,username,email,displayName,bio,profilePicUrl,createdAt,verified
Tweet
tweetId,userId,content,mediaUrls[],createdAt,likeCount,retweetCount,replyCount
Follow
followerId,followeeId,createdAt
Timeline
userId,tweetId,timestamp,score(for ranking)
Like
userId,tweetId,createdAt
Retweet
userId,originalTweetId,createdAt,comment(quote tweet)
Notification
notificationId,userId,type,actorId,tweetId,createdAt,read
3. API Interface (~5 minutes)
Protocol Choice
- REST for CRUD operations
- WebSocket for real-time notifications
- GraphQL (optional) for flexible data fetching
API Endpoints
Authentication
POST /v1/auth/login
Content-Type: application/json
{
"username": "john_doe",
"password": "securepass"
}
Response: {
"authToken": "jwt_token_here",
"userId": "user-123"
}
POST /v1/auth/register
Content-Type: application/json
{
"username": "john_doe",
"email": "john@example.com",
"password": "securepass"
}
Response: { "userId": "user-123" }
Tweet Operations
POST /v1/tweets
Authorization: Bearer <token>
Content-Type: application/json
{
"content": "Hello World!",
"mediaUrls": ["https://cdn.twitter.com/img123.jpg"]
}
Response: {
"tweetId": "tweet-123",
"createdAt": "2025-10-05T10:00:00Z"
}
GET /v1/tweets/{tweetId}
Response: {
"tweet": {
"tweetId": "tweet-123",
"userId": "user-456",
"username": "john_doe",
"content": "Hello World!",
"likeCount": 42,
"retweetCount": 10,
"createdAt": "2025-10-05T10:00:00Z"
}
}
DELETE /v1/tweets/{tweetId}
Authorization: Bearer <token>
Response: { "success": true }
Timeline (Home Feed)
GET /v1/timeline/home?cursor=tweet-100&limit=20
Authorization: Bearer <token>
Response: {
"tweets": [Tweet],
"nextCursor": "tweet-80"
}
GET /v1/timeline/user/{userId}?cursor=tweet-100&limit=20
Response: {
"tweets": [Tweet],
"nextCursor": "tweet-80"
}
Social Interactions
POST /v1/tweets/{tweetId}/like
Authorization: Bearer <token>
Response: { "success": true, "newLikeCount": 43 }
DELETE /v1/tweets/{tweetId}/like
Authorization: Bearer <token>
Response: { "success": true, "newLikeCount": 42 }
POST /v1/tweets/{tweetId}/retweet
Authorization: Bearer <token>
Content-Type: application/json
{
"comment": "Great insight!" // optional for quote tweet
}
Response: {
"retweetId": "retweet-789",
"success": true
}
POST /v1/tweets/{tweetId}/reply
Authorization: Bearer <token>
Content-Type: application/json
{
"content": "I agree!"
}
Response: {
"replyTweetId": "tweet-456"
}
Follow Operations
POST /v1/users/{userId}/follow
Authorization: Bearer <token>
Response: { "success": true }
DELETE /v1/users/{userId}/follow
Authorization: Bearer <token>
Response: { "success": true }
GET /v1/users/{userId}/followers?cursor=user-100&limit=50
Response: {
"users": [User],
"nextCursor": "user-50"
}
GET /v1/users/{userId}/following?cursor=user-100&limit=50
Response: {
"users": [User],
"nextCursor": "user-50"
}
Search & Trends
GET /v1/search/tweets?query=system+design&limit=20
Response: {
"tweets": [Tweet],
"count": 1523
}
GET /v1/trends?location=US
Response: {
"trends": [
{ "topic": "#WorldCup", "tweetCount": 2500000 },
{ "topic": "AI", "tweetCount": 1200000 }
]
}
Notifications
GET /v1/notifications?cursor=notif-100&limit=20
Authorization: Bearer <token>
Response: {
"notifications": [Notification],
"nextCursor": "notif-80",
"unreadCount": 5
}
PUT /v1/notifications/{notificationId}/read
Authorization: Bearer <token>
Response: { "success": true }
4. Data Flow (~5 minutes)
Tweet Post Flow
- User Posts Tweet: Client sends POST request with content
- Validate & Store: App server validates (length, content policy) and stores in database
- Generate Tweet ID: Snowflake ID generator creates unique ID
- Fan-out Process: Background worker pushes tweet to followers' timelines
- Cache Update: Update Redis cache for user's profile timeline
- Index for Search: Push to Elasticsearch for searchability
- Notify Mentions: If tweet mentions users, create notifications
Timeline Generation Flow
- User Requests Timeline: Client sends GET request
- Check Cache: Look up pre-computed timeline in Redis
- If Cache Hit: Return cached tweets
- If Cache Miss:
- Fetch following list
- Fetch recent tweets from each followed user
- Merge and rank tweets
- Cache result
- Fetch Metadata: Hydrate tweets with like/retweet counts
- Return to Client: Send paginated response
5. High Level Design (~10-15 minutes)
Architecture Components
Client Layer:
- Web App (React)
- Mobile Apps (iOS/Android)
- Real-time WebSocket connection
API Layer:
- API Gateway - routing, rate limiting, authentication
- Load Balancer - distributes traffic across app servers
- Application Servers - business logic (Spring Boot microservices)
Service Layer (Microservices):
- Tweet Service - CRUD operations for tweets
- Timeline Service - generates and serves timelines
- User Service - user profiles, follow/unfollow
- Notification Service - push notifications
- Search Service - full-text search
- Media Service - handles image/video uploads
Data Layer:
- PostgreSQL - users, follows (relational data)
- Cassandra - tweets, timelines (high write throughput)
- Redis - cache (timelines, user sessions, trending topics)
- Elasticsearch - search index
- S3/CDN - media storage
Messaging & Streaming:
- Kafka - event streaming (tweet events, notifications)
- RabbitMQ/SQS - task queues (fan-out workers)
Background Workers:
- Fan-out Service - pushes tweets to followers' timelines
- Trend Service - computes trending topics
- Analytics Service - processes engagement metrics
6. Architecture Diagram
7. Data Models
PostgreSQL - User & Follow Graph
users:
id (PK),
username (UNIQUE),
email (UNIQUE),
password_hash,
display_name,
bio,
profile_pic_url,
verified BOOLEAN,
follower_count,
following_count,
created_at,
updated_at
follows:
follower_id (PK, FK -> users.id),
followee_id (PK, FK -> users.id),
created_at,
PRIMARY KEY (follower_id, followee_id)
INDEX idx_followee (followee_id)
INDEX idx_follower (follower_id)
Cassandra - Tweets & Timelines
tweets:
tweet_id (PK) UUID,
user_id UUID,
content TEXT,
media_urls LIST<TEXT>,
like_count COUNTER,
retweet_count COUNTER,
reply_count COUNTER,
created_at TIMESTAMP,
PRIMARY KEY (tweet_id)
user_timeline:
user_id UUID (Partition Key),
tweet_id UUID (Clustering Key),
created_at TIMESTAMP (Clustering Key),
PRIMARY KEY ((user_id), created_at, tweet_id)
WITH CLUSTERING ORDER BY (created_at DESC, tweet_id DESC)
home_timeline:
user_id UUID (Partition Key),
tweet_id UUID (Clustering Key),
tweet_author_id UUID,
created_at TIMESTAMP (Clustering Key),
score DOUBLE, // for ranking
PRIMARY KEY ((user_id), created_at, tweet_id)
WITH CLUSTERING ORDER BY (created_at DESC)
likes:
user_id UUID (Partition Key),
tweet_id UUID (Clustering Key),
created_at TIMESTAMP,
PRIMARY KEY ((user_id), tweet_id)
retweets:
user_id UUID (Partition Key),
original_tweet_id UUID (Clustering Key),
retweet_id UUID,
comment TEXT,
created_at TIMESTAMP,
PRIMARY KEY ((user_id), original_tweet_id)
Redis Cache Structures
# User timeline cache (List)
user:timeline:{userId} -> [tweetId1, tweetId2, ...]
TTL: 15 minutes
# Home timeline cache (Sorted Set)
home:timeline:{userId} -> {tweetId: score}
TTL: 5 minutes
# Tweet metadata cache (Hash)
tweet:{tweetId} -> { content, likeCount, retweetCount, ... }
TTL: 1 hour
# Trending topics (Sorted Set)
trending:global -> {topic: score}
TTL: 5 minutes
# User session
session:{token} -> { userId, username, ... }
TTL: 24 hours
# Following list cache
following:{userId} -> [userId1, userId2, ...]
TTL: 1 hour
Elasticsearch - Search Index
{
"mappings": {
"properties": {
"tweet_id": { "type": "keyword" },
"user_id": { "type": "keyword" },
"username": { "type": "keyword" },
"content": {
"type": "text",
"analyzer": "standard",
"fields": {
"keyword": { "type": "keyword" }
}
},
"hashtags": { "type": "keyword" },
"mentions": { "type": "keyword" },
"created_at": { "type": "date" },
"like_count": { "type": "integer" },
"retweet_count": { "type": "integer" },
"language": { "type": "keyword" }
}
}
}
8. Deep Dives (~10 minutes)
8.1 Timeline Generation (Fan-out Strategies)
Problem: When a user posts a tweet, how do we update followers' timelines efficiently?
Fan-out on Write (Push Model)
How it works:
- When user posts tweet, immediately push to all followers' timelines
- Pre-compute timelines and store in Redis/Cassandra
- Timeline reads are fast (just fetch from cache)
Pros:
- Fast timeline reads (pre-computed)
- Simple timeline service
Cons:
- Slow writes for users with many followers (celebrities)
- Wasted work if followers don't view timeline
- High write amplification
Implementation:
@Service
public class FanoutService {
@Async
public void fanoutTweet(Tweet tweet) {
String userId = tweet.getUserId();
// Get all followers
List<String> followers = followRepository.getFollowers(userId);
// Push to message queue in batches
Lists.partition(followers, 1000).forEach(batch -> {
FanoutTask task = FanoutTask.builder()
.tweetId(tweet.getTweetId())
.followerIds(batch)
.build();
kafkaTemplate.send("fanout-tasks", task);
});
}
}
@KafkaListener(topics = "fanout-tasks")
public void processFanout(FanoutTask task) {
// Write to each follower's home timeline
task.getFollowerIds().forEach(followerId -> {
timelineRepository.addToHomeTimeline(
followerId,
task.getTweetId()
);
// Update Redis cache
redisTemplate.opsForZSet().add(
"home:timeline:" + followerId,
task.getTweetId(),
System.currentTimeMillis()
);
});
}
Fan-out on Read (Pull Model)
How it works:
- Don't pre-compute timelines
- When user requests timeline, fetch tweets from all followed users
- Merge and sort on-the-fly
Pros:
- Fast writes
- No wasted work
- Always fresh data
Cons:
- Slow reads (must query many users)
- Complex merge logic
- High read load
Implementation:
@Service
public class TimelineService {
public List<Tweet> getHomeTimeline(String userId, int limit) {
// Get users this user follows
List<String> following = followRepository.getFollowing(userId);
// Fetch recent tweets from each followed user (parallel)
List<CompletableFuture<List<Tweet>>> futures = following.stream()
.map(followeeId -> CompletableFuture.supplyAsync(() ->
tweetRepository.getUserRecentTweets(followeeId, 10)
))
.collect(Collectors.toList());
// Wait for all and merge
List<Tweet> allTweets = futures.stream()
.map(CompletableFuture::join)
.flatMap(List::stream)
.sorted(Comparator.comparing(Tweet::getCreatedAt).reversed())
.limit(limit)
.collect(Collectors.toList());
return allTweets;
}
}
Hybrid Approach (Twitter's Solution)
Strategy:
- Regular users (< 10K followers): Fan-out on write
- Celebrities (> 10K followers): Fan-out on read
- Mix both results when serving timeline
Implementation:
@Service
public class HybridTimelineService {
public List<Tweet> getHomeTimeline(String userId, int limit) {
// Part 1: Get pre-computed timeline (from regular users)
List<Tweet> preComputedTweets = cassandraTemplate
.select("SELECT * FROM home_timeline WHERE user_id = ? LIMIT ?",
userId, limit);
// Part 2: Get celebrity tweets (fetch on-demand)
List<String> celebritiesFollowed = followRepository
.getCelebritiesFollowed(userId);
List<Tweet> celebrityTweets = celebritiesFollowed.stream()
.flatMap(celeb ->
tweetRepository.getUserRecentTweets(celeb, 10).stream()
)
.collect(Collectors.toList());
// Merge and sort
List<Tweet> merged = Stream.concat(
preComputedTweets.stream(),
celebrityTweets.stream()
)
.sorted(Comparator.comparing(Tweet::getCreatedAt).reversed())
.distinct()
.limit(limit)
.collect(Collectors.toList());
return merged;
}
}
8.2 Caching Strategy
Multi-Level Cache
L1 - Application Cache (In-Memory):
@Service
public class TweetService {
private final LoadingCache<String, Tweet> tweetCache = Caffeine.newBuilder()
.maximumSize(10_000)
.expireAfterWrite(5, TimeUnit.MINUTES)
.build(tweetId -> tweetRepository.findById(tweetId));
public Tweet getTweet(String tweetId) {
return tweetCache.get(tweetId);
}
}
L2 - Redis Cache (Distributed):
@Cacheable(value = "tweets", key = "#tweetId")
public Tweet getTweetById(String tweetId) {
return cassandraTemplate.selectOne(
"SELECT * FROM tweets WHERE tweet_id = ?",
tweetId
);
}
@Cacheable(value = "home-timeline", key = "#userId")
public List<Tweet> getHomeTimeline(String userId) {
return timelineService.generateTimeline(userId);
}
Cache Warming for Trending Tweets:
@Scheduled(fixedRate = 60000) // Every minute
public void warmTrendingCache() {
List<String> trendingTweetIds = trendService.getTrendingTweetIds();
trendingTweetIds.forEach(tweetId -> {
Tweet tweet = tweetRepository.findById(tweetId);
redisTemplate.opsForValue().set(
"tweet:" + tweetId,
tweet,
1,
TimeUnit.HOURS
);
});
}
Cache Invalidation
@Service
public class TweetService {
@CacheEvict(value = "tweets", key = "#tweetId")
public void deleteTweet(String tweetId) {
tweetRepository.delete(tweetId);
}
// When tweet is liked, update counter cache
public void likeTweet(String userId, String tweetId) {
// Increment in database
cassandraTemplate.execute(
"UPDATE tweets SET like_count = like_count + 1 WHERE tweet_id = ?",
tweetId
);
// Invalidate cache to fetch fresh data
cacheManager.getCache("tweets").evict(tweetId);
// Or update cache directly
redisTemplate.opsForHash().increment(
"tweet:" + tweetId,
"likeCount",
1
);
}
}
8.3 Database Scaling
PostgreSQL (User Service)
Read Replicas:
@Configuration
public class DataSourceConfig {
@Bean
public DataSource routingDataSource() {
Map<Object, Object> targetDataSources = new HashMap<>();
targetDataSources.put("master", masterDataSource());
targetDataSources.put("replica1", replicaDataSource1());
targetDataSources.put("replica2", replicaDataSource2());
ReplicationRoutingDataSource routingDataSource =
new ReplicationRoutingDataSource();
routingDataSource.setTargetDataSources(targetDataSources);
routingDataSource.setDefaultTargetDataSource(masterDataSource());
return routingDataSource;
}
}
@Transactional(readOnly = true)
public User getUserById(String userId) {
// Automatically routes to read replica
return userRepository.findById(userId);
}
@Transactional
public void createUser(User user) {
// Routes to master
userRepository.save(user);
}
Indexing:
-- Improve follow queries
CREATE INDEX idx_follows_follower ON follows(follower_id);
CREATE INDEX idx_follows_followee ON follows(followee_id);
-- Username search
CREATE INDEX idx_users_username ON users(username);
CREATE INDEX idx_users_email ON users(email);
Cassandra (Tweet Service)
Partition Strategy:
-- User timeline: Fast queries for all tweets by a user
CREATE TABLE user_timeline (
user_id UUID,
created_at TIMESTAMP,
tweet_id UUID,
PRIMARY KEY ((user_id), created_at, tweet_id)
) WITH CLUSTERING ORDER BY (created_at DESC);
-- Home timeline: Fast queries for user's feed
CREATE TABLE home_timeline (
user_id UUID,
created_at TIMESTAMP,
tweet_id UUID,
PRIMARY KEY ((user_id), created_at)
) WITH CLUSTERING ORDER BY (created_at DESC);
Write Optimization:
@Service
public class TweetRepository {
public void saveTweet(Tweet tweet) {
// Batch writes for better performance
BatchStatement batch = new BatchStatement();
// Insert into tweets table
batch.add(insertTweetStatement(tweet));
// Insert into user_timeline table
batch.add(insertUserTimelineStatement(tweet));
// Execute batch
cassandraTemplate.execute(batch);
}
}
8.4 Handling Viral Tweets
Problem: A celebrity posts a tweet and gets 1M likes in 10 minutes
Counter Aggregation
Challenge: Cassandra counters become hotspot
Solution: Counter Sharding
@Service
public class CounterService {
private static final int NUM_SHARDS = 10;
public void incrementLikeCount(String tweetId) {
int shard = ThreadLocalRandom.current().nextInt(NUM_SHARDS);
String key = String.format("likes:%s:shard:%d", tweetId, shard);
redisTemplate.opsForValue().increment(key);
// Async aggregate every 10 seconds
asyncAggregateShards(tweetId);
}
public long getLikeCount(String tweetId) {
long total = 0;
for (int i = 0; i < NUM_SHARDS; i++) {
String key = String.format("likes:%s:shard:%d", tweetId, i);
Long count = redisTemplate.opsForValue().get(key);
total += (count != null) ? count : 0;
}
return total;
}
@Async
private void asyncAggregateShards(String tweetId) {
long totalLikes = getLikeCount(tweetId);
// Update Cassandra periodically (not on every like)
cassandraTemplate.execute(
"UPDATE tweets SET like_count = ? WHERE tweet_id = ?",
totalLikes, tweetId
);
}
}
Rate Limiting
@Service
public class RateLimiterService {
@RateLimiter(
name = "tweet-post",
fallbackMethod = "rateLimitExceeded"
)
public Tweet postTweet(String userId, String content) {
return tweetService.createTweet(userId, content);
}
public Tweet rateLimitExceeded(String userId, String content,
RequestNotPermitted e) {
throw new RateLimitException(
"You can only post 100 tweets per hour"
);
}
}
// Configuration
// resilience4j.ratelimiter.instances.tweet-post.limit-for-period=100
// resilience4j.ratelimiter.instances.tweet-post.limit-refresh-period=1h
8.5 Search & Trending
Elasticsearch Integration
Indexing Pipeline:
@Service
public class SearchIndexService {
@KafkaListener(topics = "tweet-events")
public void indexTweet(TweetEvent event) {
Tweet tweet = event.getTweet();
TweetDocument doc = TweetDocument.builder()
.tweetId(tweet.getTweetId())
.userId(tweet.getUserId())
.content(tweet.getContent())
.hashtags(extractHashtags(tweet.getContent()))
.mentions(extractMentions(tweet.getContent()))
.createdAt(tweet.getCreatedAt())
.build();
elasticsearchTemplate.save(doc);
}
private List<String> extractHashtags(String content) {
return Pattern.compile("#(\\w+)")
.matcher(content)
.results()
.map(mr -> mr.group(1))
.collect(Collectors.toList());
}
}
Search Query:
@Service
public class SearchService {
public SearchResults searchTweets(String query, SearchFilters filters) {
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
// Text search
if (query != null && !query.isEmpty()) {
boolQuery.must(QueryBuilders
.multiMatchQuery(query, "content", "hashtags"));
}
// Filters
if (filters.getFromUserId() != null) {
boolQuery.filter(QueryBuilders
.termQuery("user_id", filters.getFromUserId()));
}
if (filters.getStartDate() != null) {
boolQuery.filter(QueryBuilders
.rangeQuery("created_at")
.gte(filters.getStartDate()));
}
// Sort by relevance or recency
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(boolQuery)
.withSort(SortBuilders.fieldSort("created_at").order(SortOrder.DESC))
.withPageable(PageRequest.of(0, 20))
.build();
return elasticsearchTemplate.search(searchQuery, TweetDocument.class);
}
}
Trending Topics Calculation
Real-time Trending:
@Service
public class TrendingService {
@Scheduled(fixedRate = 60000) // Every minute
public void calculateTrending() {
Instant now = Instant.now();
Instant oneHourAgo = now.minus(1, ChronoUnit.HOURS);
// Aggregate hashtag counts from Elasticsearch
SearchQuery query = new NativeSearchQueryBuilder()
.withQuery(QueryBuilders.rangeQuery("created_at")
.gte(oneHourAgo)
.lte(now))
.addAggregation(AggregationBuilders
.terms("trending_hashtags")
.field("hashtags")
.size(50)
.order(BucketOrder.count(false)))
.build();
SearchHits<TweetDocument> hits = elasticsearchTemplate
.search(query, TweetDocument.class);
Aggregations aggregations = hits.getAggregations();
Terms terms = aggregations.get("trending_hashtags");
// Store in Redis sorted set
terms.getBuckets().forEach(bucket -> {
String hashtag = bucket.getKeyAsString();
long count = bucket.getDocCount();
redisTemplate.opsForZSet().add(
"trending:global",
hashtag,
count
);
});
// Set expiry
redisTemplate.expire("trending:global", 5, TimeUnit.MINUTES);
}
public List<TrendingTopic> getTrending(int limit) {
Set<ZSetOperations.TypedTuple<String>> trending =
redisTemplate.opsForZSet()
.reverseRangeWithScores("trending:global", 0, limit - 1);
return trending.stream()
.map(tuple -> TrendingTopic.builder()
.topic(tuple.getValue())
.tweetCount(tuple.getScore().longValue())
.build())
.collect(Collectors.toList());
}
}
8.6 Real-time Notifications
WebSocket Connection
@Configuration
@EnableWebSocketMessageBroker
public class WebSocketConfig implements WebSocketMessageBrokerConfigurer {
@Override
public void registerStompEndpoints(StompEndpointRegistry registry) {
registry.addEndpoint("/ws")
.setAllowedOrigins("*")
.withSockJS();
}
@Override
public void configureMessageBroker(MessageBrokerRegistry config) {
config.enableSimpleBroker("/topic", "/queue");
config.setApplicationDestinationPrefixes("/app");
}
}
@Service
public class NotificationService {
@Autowired
private SimpMessagingTemplate messagingTemplate;
public void sendNotification(String userId, Notification notification) {
// Send via WebSocket to connected clients
messagingTemplate.convertAndSendToUser(
userId,
"/queue/notifications",
notification
);
// Also store in database for offline users
notificationRepository.save(notification);
}
@KafkaListener(topics = "notification-events")
public void processNotificationEvent(NotificationEvent event) {
Notification notification = createNotification(event);
// Check if user is online (WebSocket connection active)
if (isUserOnline(event.getUserId())) {
sendNotification(event.getUserId(), notification);
} else {
// Store for later retrieval
notificationRepository.save(notification);
// Optionally send push notification to mobile
pushNotificationService.send(event.getUserId(), notification);
}
}
}
Notification Types
@Service
public class NotificationFactory {
public void createLikeNotification(String tweetId, String likerId) {
Tweet tweet = tweetRepository.findById(tweetId);
NotificationEvent event = NotificationEvent.builder()
.userId(tweet.getUserId()) // Tweet author
.type(NotificationType.LIKE)
.actorId(likerId)
.tweetId(tweetId)
.message(String.format("@%s liked your tweet", likerId))
.build();
kafkaTemplate.send("notification-events", event);
}
public void createRetweetNotification(String tweetId, String retweeterId) {
Tweet tweet = tweetRepository.findById(tweetId);
NotificationEvent event = NotificationEvent.builder()
.userId(tweet.getUserId())
.type(NotificationType.RETWEET)
.actorId(retweeterId)
.tweetId(tweetId)
.message(String.format("@%s retweeted your tweet", retweeterId))
.build();
kafkaTemplate.send("notification-events", event);
}
public void createMentionNotification(Tweet tweet, String mentionedUserId) {
NotificationEvent event = NotificationEvent.builder()
.userId(mentionedUserId)
.type(NotificationType.MENTION)
.actorId(tweet.getUserId())
.tweetId(tweet.getTweetId())
.message(String.format("@%s mentioned you", tweet.getUserId()))
.build();
kafkaTemplate.send("notification-events", event);
}
}
8.7 Media Handling
Upload Flow
@RestController
@RequestMapping("/v1/media")
public class MediaController {
@PostMapping("/upload")
public ResponseEntity<MediaUploadResponse> uploadMedia(
@RequestParam("file") MultipartFile file,
@RequestHeader("Authorization") String token
) {
// Validate file
validateMediaFile(file);
String userId = authService.getUserIdFromToken(token);
String mediaId = UUID.randomUUID().toString();
String fileName = mediaId + getFileExtension(file);
// Upload to S3
s3Client.putObject(
PutObjectRequest.builder()
.bucket("twitter-media")
.key(fileName)
.acl(ObjectCannedACL.PUBLIC_READ)
.build(),
RequestBody.fromBytes(file.getBytes())
);
String mediaUrl = String.format(
"https://cdn.twitter.com/media/%s",
fileName
);
// Store metadata in database
Media media = Media.builder()
.mediaId(mediaId)
.userId(userId)
.url(mediaUrl)
.type(file.getContentType())
.size(file.getSize())
.build();
mediaRepository.save(media);
return ResponseEntity.ok(
MediaUploadResponse.builder()
.mediaId(mediaId)
.url(mediaUrl)
.build()
);
}
private void validateMediaFile(MultipartFile file) {
// Max 5MB for images, 512MB for videos
long maxSize = file.getContentType().startsWith("video")
? 512 * 1024 * 1024
: 5 * 1024 * 1024;
if (file.getSize() > maxSize) {
throw new MediaTooLargeException("File exceeds size limit");
}
// Validate content type
List<String> allowedTypes = Arrays.asList(
"image/jpeg", "image/png", "image/gif",
"video/mp4", "video/quicktime"
);
if (!allowedTypes.contains(file.getContentType())) {
throw new InvalidMediaTypeException("Unsupported media type");
}
}
}
Video Processing
@Service
public class VideoProcessingService {
@KafkaListener(topics = "video-upload-events")
public void processVideo(VideoUploadEvent event) {
String videoUrl = event.getVideoUrl();
// Download from S3
byte[] videoData = s3Client.getObjectAsBytes(
GetObjectRequest.builder()
.bucket("twitter-media")
.key(extractKey(videoUrl))
.build()
);
// Transcode to multiple formats
List<VideoFormat> formats = Arrays.asList(
new VideoFormat("720p", 2_000_000),
new VideoFormat("480p", 1_000_000),
new VideoFormat("360p", 500_000)
);
formats.forEach(format -> {
byte[] transcoded = ffmpegService.transcode(
videoData,
format
);
String transcodedKey = String.format(
"%s_%s.mp4",
event.getMediaId(),
format.getResolution()
);
s3Client.putObject(
PutObjectRequest.builder()
.bucket("twitter-media")
.key(transcodedKey)
.build(),
RequestBody.fromBytes(transcoded)
);
});
// Update media metadata
mediaRepository.markAsProcessed(event.getMediaId());
}
}
8.8 Security & Privacy
Authentication
@Service
public class AuthService {
public AuthResponse login(String username, String password) {
User user = userRepository.findByUsername(username)
.orElseThrow(() -> new InvalidCredentialsException());
if (!passwordEncoder.matches(password, user.getPasswordHash())) {
throw new InvalidCredentialsException();
}
// Generate JWT token
String token = jwtService.generateToken(user);
// Store session in Redis
redisTemplate.opsForValue().set(
"session:" + token,
user.getUserId(),
24,
TimeUnit.HOURS
);
return AuthResponse.builder()
.token(token)
.userId(user.getUserId())
.expiresIn(86400)
.build();
}
public void logout(String token) {
redisTemplate.delete("session:" + token);
jwtService.invalidateToken(token);
}
}
@Component
public class JwtAuthFilter extends OncePerRequestFilter {
@Override
protected void doFilterInternal(
HttpServletRequest request,
HttpServletResponse response,
FilterChain filterChain
) throws ServletException, IOException {
String token = extractToken(request);
if (token != null && jwtService.validateToken(token)) {
String userId = redisTemplate.opsForValue()
.get("session:" + token);
if (userId != null) {
UsernamePasswordAuthenticationToken auth =
new UsernamePasswordAuthenticationToken(
userId, null, Collections.emptyList()
);
SecurityContextHolder.getContext()
.setAuthentication(auth);
}
}
filterChain.doFilter(request, response);
}
}
Content Moderation
@Service
public class ContentModerationService {
public void moderateTweet(Tweet tweet) {
// Check for banned words
if (containsBannedWords(tweet.getContent())) {
tweet.setStatus(TweetStatus.FLAGGED);
humanReviewQueue.add(tweet);
return;
}
// Use ML model for toxic content detection
double toxicityScore = mlService.getToxicityScore(
tweet.getContent()
);
if (toxicityScore > 0.8) {
tweet.setStatus(TweetStatus.FLAGGED);
humanReviewQueue.add(tweet);
} else if (toxicityScore > 0.5) {
tweet.setStatus(TweetStatus.WARNING);
}
tweetRepository.save(tweet);
}
private boolean containsBannedWords(String content) {
Set<String> bannedWords = getBannedWordsFromCache();
return bannedWords.stream()
.anyMatch(word ->
content.toLowerCase().contains(word.toLowerCase())
);
}
}
Rate Limiting by Tier
@Component
public class TierBasedRateLimiter {
public boolean allowRequest(String userId, String operation) {
User user = userRepository.findById(userId);
RateLimitConfig config = getRateLimitConfig(
user.getTier(),
operation
);
String key = String.format("ratelimit:%s:%s", userId, operation);
Long currentCount = redisTemplate.opsForValue()
.increment(key, 1);
if (currentCount == 1) {
redisTemplate.expire(key, config.getWindow(), TimeUnit.SECONDS);
}
return currentCount <= config.getLimit();
}
private RateLimitConfig getRateLimitConfig(UserTier tier, String op) {
// Free tier: 100 tweets/day, 500 API calls/hour
// Premium tier: 1000 tweets/day, 5000 API calls/hour
// Enterprise: Unlimited
if (tier == UserTier.FREE) {
return op.equals("post_tweet")
? new RateLimitConfig(100, 86400) // 100 per day
: new RateLimitConfig(500, 3600); // 500 per hour
} else if (tier == UserTier.PREMIUM) {
return op.equals("post_tweet")
? new RateLimitConfig(1000, 86400)
: new RateLimitConfig(5000, 3600);
} else {
return new RateLimitConfig(Integer.MAX_VALUE, 3600);
}
}
}
8.9 Analytics & Monitoring
Metrics Collection
@Service
public class MetricsService {
@Autowired
private MeterRegistry meterRegistry;
public void recordTweetPost() {
Counter.builder("tweets.posted")
.tag("type", "organic")
.register(meterRegistry)
.increment();
}
public void recordTimelineLoad(long latencyMs) {
Timer.builder("timeline.load.latency")
.register(meterRegistry)
.record(latencyMs, TimeUnit.MILLISECONDS);
}
public void recordCacheHit(String cacheType) {
Counter.builder("cache.hits")
.tag("cache", cacheType)
.register(meterRegistry)
.increment();
}
}
@Aspect
@Component
public class PerformanceMonitoringAspect {
@Around("@annotation(Monitored)")
public Object monitorPerformance(ProceedingJoinPoint joinPoint)
throws Throwable {
long startTime = System.currentTimeMillis();
String methodName = joinPoint.getSignature().getName();
try {
Object result = joinPoint.proceed();
long duration = System.currentTimeMillis() - startTime;
metricsService.recordMethodExecution(methodName, duration, true);
return result;
} catch (Exception e) {
long duration = System.currentTimeMillis() - startTime;
metricsService.recordMethodExecution(methodName, duration, false);
throw e;
}
}
}
Health Checks
@RestController
@RequestMapping("/health")
public class HealthCheckController {
@GetMapping
public ResponseEntity<HealthStatus> healthCheck() {
HealthStatus status = HealthStatus.builder()
.status("UP")
.timestamp(Instant.now())
.checks(performHealthChecks())
.build();
return ResponseEntity.ok(status);
}
private Map<String, ComponentHealth> performHealthChecks() {
Map<String, ComponentHealth> checks = new HashMap<>();
// Database health
checks.put("postgres", checkPostgres());
checks.put("cassandra", checkCassandra());
checks.put("redis", checkRedis());
// External services
checks.put("s3", checkS3());
checks.put("elasticsearch", checkElasticsearch());
return checks;
}
private ComponentHealth checkRedis() {
try {
redisTemplate.opsForValue().get("health-check");
return ComponentHealth.up();
} catch (Exception e) {
return ComponentHealth.down(e.getMessage());
}
}
}
Distributed Tracing
@Configuration
public class TracingConfig {
@Bean
public Tracer jaegerTracer() {
return Configuration.fromEnv("twitter-service")
.getTracer();
}
}
@Service
public class TweetService {
@Autowired
private Tracer tracer;
public Tweet createTweet(String userId, String content) {
Span span = tracer.buildSpan("create-tweet").start();
try (Scope scope = tracer.activateSpan(span)) {
span.setTag("user_id", userId);
span.log("Validating tweet content");
validateTweet(content);
span.log("Saving tweet to database");
Tweet tweet = saveTweet(userId, content);
span.log("Initiating fan-out");
fanoutService.fanoutTweet(tweet);
span.setTag("tweet_id", tweet.getTweetId());
return tweet;
} catch (Exception e) {
span.setTag("error", true);
span.log(ImmutableMap.of("event", "error", "message", e.getMessage()));
throw e;
} finally {
span.finish();
}
}
}
8.10 Disaster Recovery & High Availability
Multi-Region Deployment
# Region Configuration
regions:
- name: us-east-1
primary: true
services:
- tweet-service
- timeline-service
- user-service
databases:
- postgres-master
- cassandra-datacenter-1
- name: us-west-2
primary: false
services:
- tweet-service
- timeline-service
- user-service
databases:
- postgres-replica
- cassandra-datacenter-2
- name: eu-west-1
primary: false
services:
- tweet-service
- timeline-service
databases:
- postgres-replica
- cassandra-datacenter-3
Database Backup Strategy
@Service
public class BackupService {
@Scheduled(cron = "0 0 2 * * *") // Daily at 2 AM
public void performDatabaseBackup() {
// PostgreSQL backup
String timestamp = LocalDateTime.now()
.format(DateTimeFormatter.ofPattern("yyyyMMdd_HHmmss"));
String backupCommand = String.format(
"pg_dump -h %s -U %s %s > backup_%s.sql",
dbHost, dbUser, dbName, timestamp
);
executeCommand(backupCommand);
// Upload to S3
s3Client.putObject(
PutObjectRequest.builder()
.bucket("twitter-backups")
.key("postgres/backup_" + timestamp + ".sql")
.build(),
Paths.get("backup_" + timestamp + ".sql")
);
// Cassandra snapshot
cassandraBackupService.createSnapshot(timestamp);
}
@Scheduled(cron = "0 0 3 * * SUN") // Weekly on Sunday at 3 AM
public void cleanOldBackups() {
Instant thirtyDaysAgo = Instant.now().minus(30, ChronoUnit.DAYS);
// Delete backups older than 30 days
ListObjectsV2Response response = s3Client.listObjectsV2(
ListObjectsV2Request.builder()
.bucket("twitter-backups")
.build()
);
response.contents().stream()
.filter(obj -> obj.lastModified().isBefore(thirtyDaysAgo))
.forEach(obj -> s3Client.deleteObject(
DeleteObjectRequest.builder()
.bucket("twitter-backups")
.key(obj.key())
.build()
));
}
}
Circuit Breaker for External Services
@Service
public class ExternalServiceClient {
@CircuitBreaker(
name = "media-service",
fallbackMethod = "fallbackUploadMedia"
)
@Retry(name = "media-service", fallbackMethod = "fallbackUploadMedia")
public String uploadMedia(MultipartFile file) {
return mediaServiceClient.upload(file);
}
public String fallbackUploadMedia(MultipartFile file, Exception e) {
log.error("Media upload failed, storing locally", e);
// Store locally and queue for retry
String localPath = storageService.saveLocally(file);
retryQueue.add(new MediaUploadTask(localPath));
return localPath;
}
@CircuitBreaker(
name = "notification-service",
fallbackMethod = "fallbackNotification"
)
public void sendNotification(Notification notification) {
notificationServiceClient.send(notification);
}
public void fallbackNotification(Notification notification, Exception e) {
// Store in database for later delivery
notificationRepository.saveForRetry(notification);
}
}
// Configuration
// resilience4j.circuitbreaker.instances.media-service.failure-rate-threshold=50
// resilience4j.circuitbreaker.instances.media-service.wait-duration-in-open-state=60s
// resilience4j.circuitbreaker.instances.media-service.sliding-window-size=10
Summary
Key Design Decisions
- Hybrid Fan-out Strategy: Regular users get fan-out on write, celebrities use fan-out on read
- Polyglot Persistence: PostgreSQL (users/follows), Cassandra (tweets/timelines), Redis (cache), Elasticsearch (search)
- Event-Driven Architecture: Kafka for async processing (notifications, analytics, fan-out)
- Multi-Level Caching: Application cache → Redis → Database
- Read Replicas: Separate read/write traffic for PostgreSQL
- Counter Sharding: Handle viral tweets with distributed counters
- WebSocket: Real-time notifications for online users
- CDN: Media delivery with global edge locations
Scalability Achieved
- ✅ 500M users, 200M DAU
- ✅ 4600 tweets per second
- ✅ 46K timeline reads per second
- ✅ Timeline load < 500ms (P95)
- ✅ 99.99% availability
- ✅ Handles viral tweets (millions of interactions)
Trade-offs & Considerations
| Decision | Pro | Con |
|---|---|---|
| Fan-out on Write | Fast reads, simple timeline service | Slow writes for celebrities, wasted work |
| Fan-out on Read | Fast writes, always fresh | Slow reads, complex merge logic |
| Hybrid Approach | Balance of both | Increased complexity |
| Cassandra for Tweets | Linear scalability, high write throughput | Eventual consistency, limited queries |
| Elasticsearch for Search | Fast full-text search, faceted search | Additional maintenance, sync lag |
| Async Notifications | Non-blocking, scalable | Possible delay, delivery not guaranteed |
| Counter Sharding | Handles hotspots | Eventually consistent counts |
Future Enhancements
- Spaces (Audio Rooms): Live audio conversations
- Communities: Topic-based groups and forums
- Newsletter Integration: Long-form content via Revue
- E-commerce: In-app shopping and product pages
- Super Follows: Paid subscriptions for exclusive content
- Twitter Blue: Premium features (edit tweets, longer videos)
- AI Moderation: Better automated content filtering
- Decentralized Architecture: Explore blockchain/ActivityPub
- Edge Timeline Generation: Compute timelines at CDN edge
- Video Live Streaming: Native live broadcasting
Additional Considerations
Data Privacy & GDPR Compliance
@Service
public class DataPrivacyService {
public void deleteUserData(String userId) {
// Delete user account
userRepository.delete(userId);
// Delete or anonymize tweets
tweetRepository.anonymizeTweetsByUser(userId);
// Remove from followers/following
followRepository.deleteAllByUser(userId);
// Clear cache
cacheService.evictUser(userId);
// Export user data (GDPR right to data portability)
dataExportService.queueExport(userId);
}
public UserDataExport exportUserData(String userId) {
return UserDataExport.builder()
.user(userRepository.findById(userId))
.tweets(tweetRepository.findAllByUser(userId))
.likes(likeRepository.findAllByUser(userId))
.followers(followRepository.getFollowers(userId))
.following(followRepository.getFollowing(userId))
.build();
}
}
API Versioning
@RestController
@RequestMapping("/v2/tweets")
public class TweetControllerV2 {
// V2 API with enhanced features
@PostMapping
public ResponseEntity<TweetResponse> createTweet(
@RequestBody TweetRequest request
) {
// New features: polls, edit history, etc.
return ResponseEntity.ok(tweetService.createTweetV2(request));
}
}
// Keep V1 for backward compatibility
@RestController
@RequestMapping("/v1/tweets")
@Deprecated
public class TweetControllerV1 {
// Legacy API
}